Skip to content

[Test] Extend test_essential_feature to also cover basic GPU workload leveraging CUDA Samples.#7401

Merged
gmarciani merged 2 commits into
aws:developfrom
gmarciani:wip/mgiacomo/3160/test-essential-gpu-0519-1
May 20, 2026
Merged

[Test] Extend test_essential_feature to also cover basic GPU workload leveraging CUDA Samples.#7401
gmarciani merged 2 commits into
aws:developfrom
gmarciani:wip/mgiacomo/3160/test-essential-gpu-0519-1

Conversation

@gmarciani
Copy link
Copy Markdown
Contributor

@gmarciani gmarciani commented May 19, 2026

Description of changes

Extend test_essential_feature to also cover basic GPU workload leveraging CUDA Samples.

Tests

  • SUCCESS test_essential_feature and verified that it is using the expected 5 equivalent flex instance types

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani added skip-changelog-update Disables the check that enforces changelog updates in PRs 3.x Test labels May 19, 2026
@gmarciani gmarciani marked this pull request as ready for review May 20, 2026 07:39
@gmarciani gmarciani requested review from a team as code owners May 20, 2026 07:39
@gmarciani gmarciani changed the title [Test] Extend test_essential_feature to also cover basic GPU workload… [Test] Extend test_essential_feature to also cover basic GPU workload leveraging CUDA Samples. May 20, 2026
gmarciani added 2 commits May 20, 2026 10:12
…pes to reduce the risk of ICEs.

Flexible instance type are cached to reduce the number of EC2 requests.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-essential-gpu-0519-1 branch from f590098 to 0c357ec Compare May 20, 2026 08:12
Copy link
Copy Markdown
Contributor

@hehe7318 hehe7318 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with a comment.

echo "Node: $(hostname)"
echo "Sample: $SAMPLE_REL"
echo "SLURM_JOB_GPUS=${SLURM_JOB_GPUS:-unset}"
echo "CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-unset}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Do we need to explicitly set --gres=gpu to ensure GPU is visible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in this case because we are targeting the queue that only has cr with GPUs

@gmarciani gmarciani merged commit 5efdf4b into aws:develop May 20, 2026
19 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/3160/test-essential-gpu-0519-1 branch May 20, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.x skip-changelog-update Disables the check that enforces changelog updates in PRs Test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants